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A NEW DIRECT PROOF OF THE CENTRAL LIMIT THEOREM 


VLADIMIR DOBRIC AND PATRICIA GARMIRIAN 


Abstract. We prove the Central Limit Theorem (CLT) from the definition of weak conver¬ 
gence using the Haar wavelet basis, calculus, and elementary probability. The use of the Haar 
basis pinpoints the role of L^([0,1]) in the CLT as well as the assumption of finite variance. 
We estimate the rate of convergence and prove strong convergence off the tails. 


1. Introduction 

The Central Limit Theorem (CLT) is one of the most fundamental theorems of probability 
theory. The theorem states that standardized sums of i.i.d. random variables having finite vari¬ 
ance converge weakly to the standard normal distribution. As early as the 1770s, mathematicians 
were searching for the “central limit," trying to establish the correct conditions for convergence in 
distribution and the formula for the limiting distribution. The connection between convergence 
in distribution and characteristic functions was established in the 1920s by Levy. 

The connection between convergence in distribution and weak convergence was established in 
the late 1940s (see [6]). For a measurable space {S,B{S)), where S' is a Polish space, a sequence 
of measures converges weakly to p provided that for each bounded, continuous function 
/ : S ^ R, 

lim / f{x)dfin{x)= / f{x)dii{x). 

Js 

The advantage of using this definition is that it may serve as a stepping-stone to extending the 
CLT to random variables having values in more general spaces. 

A stronger type of convergence for measures than “weak" convergence is “strong" convergence. 
For a measurable space {S,B{S)), a sequence of measures converges strongly to p provided 
that for each set A G B{S), 

lim ^J.n{A) = fj.{A). 

n—^oo 

We show that the type of convergence in the CLT is in fact strong off the tails. 

In 1935, both Feller, [5], and Levy (independently), [8], [9], proved the Central Limit Theorem 
(CLT) using characteristic functions. As the CLT is a fundamental theorem in probability theory, 
since the time of Donsker many experts have believed that there should be a direct proof (see [6] 
for example). Also, there is obvious interest in determining rates and constants of convergence 
and these characteristic function proofs gave no information about these issues. 

Since 1935, there have been a number of more elementary or direct proofs of the CLT which 
do not use characteristic functions, e.g., [1], [2], [3], [4], [10] and [11]. The last two prove the 
CLT directly from the definition of weak convergence and [2] and (taken together) [1], [3], [4], do 
give a rate of convergence of as well as a constant of convergence. However, all of these 

proofs involve a hypothesis stronger than the optimal hypothesis of Feller and Levy, namely finite 
variance. The hypothesis of [1], [2], [3], [4] and [11] is finite third moment, while the hypothesis 
of [10] is continuous second derivative of the function / in the definition of weak convergence. 
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We present a proof, directly from the definition of weak convergence, avoiding characteristic 
functions, which is optimal in terms of hypothesis (as in [5], [8], [9]). We elaborate on the next 
complex of notions in what follows, but for now, it will suffice to say that the proof involves 
multinomial approximations to the initial sum of random variables, and identifies “tails” of both 
these multinomials and the Gaussian. We obtain strong convergence off these tails, i.e. the 
sum of the absolute values of the differences between the multinomial and Gaussian probabilities 
converges to zero. Our proof also provides estimates of the rate and constant of convergence off 
these tails; the former is comparable to the rates of [2] and ([1], [3], [4]). 

The translation into the language of random variables of the above definition of weak conver¬ 
gence is established by letting /j,„ = P o X~^. A sequence of random variables (A„) converges 
weakly to a random variable X if for each bounded, continuous function / : R —^ K, 

hm E{f{X^)) = E{f{X)), 

n—^oo 

and it is this translation that we use in what follows. A sequence of random variables A„ on a 
measurable space (S', B{S)) converges strongly to a random variable X if for each A G B{S), 

lim P{Xr, G A) = P{X G A). 

n—¥oo 

In the case where A„ and X are discrete with range set J, by the triangle inequality, it is sufficient 
to show that 

lim J2\PiXn=j)-P{X=j)\=0. 

n—foo ^^ 
jeJ 

In fact, approximating the initial sum of random variables and the Gaussian by discrete versions, 
the preceding statement holds for the sequences in the CLT off the tails. 

Our proof employs the expansion of random variables on [0,1] (equipped with Lebesgue mea¬ 
sure, on Borel sets) with respect to the Haar basis. The Haar basis is the simplest orthonormal 
system for T^([0,1]). By considering random variables in T^([0,1]), this proof is consistent with 
the assumption of finite variance. Therefore, the Haar basis is a natural tool for proving the 
CLT. 

The proof proceeds as follows: Given an i.i.d. sequence of random variables on a probability 
space, we construct an i.i.d. sequence on [0,1] with the Borel sigma algebra and Lebesgue measure 
having the same sequence of distributions. As the new sequence of random variables is defined 
on [0,1] and also has finite variance, we then expand this sequence with respect to the Haar basis. 

We then reduce the problem of showing weak convergence of this new sequence of random 
variables to the case where the Haar expansions are truncated to have only M terms, for some 
finite AI which will be chosen to accomplish certain other objectives. (Lemma 1) These truncated 
Haar expansions each have m = 2^+^ possible outcomes. Next, in Proposition 1, we show that 
the sum of Haar expansions having only M terms is in fact the projection of a multinomial 
random variable. 

In Lemma 2, we identify (via the constant bg introduced there) the tails of the multinomial 
random variable. After cutting off these tails, we compute the probabilities for the multino¬ 
mial distribution using Stirlings’s formula and Taylor series approximation (Lemma 3). The 
appearance of the Gaussian density on the multinomial side can be seen in this step. 

On the Gaussian side, we express a standard normal random variable as a sum of m inde¬ 
pendent normal random variables with coefficients being the outcomes of the truncated Haar 
expansion. We then apply Fubini’s Theorem to reduce by one dimension the expression for the 
expected value on the Gaussian side as an integral over a hyperplane in R"* (Lemma 4). In 
Lemma 5, we identify (via the constant bi introduced there) the tails of the Gaussian. After 
cutting off these tails, we approximate the integral by a Riemann sum. In Proposition 2, we 
pull together the results of Lemmas 4 and 5. The Riemann and the multinomial sums match 
perfectly. 

In Theorem 1, by bounding the function / by its sup norm, we estimate the sum of the absolute 
values of the differences between the multinomial and Gaussian probabilities, establishing strong 
convergence off the tails. It is here that we also obtain the rate of convergence of and 


2 


9 2 

the constant for convergence of , also off the tails. In both instances, the restriction to “off 
the tails” arises since our truncations (of the Haar expansions, the multinomial sum, and the 
Gaussian Riemann sum) are based Chebyshev’s inequality, in which coarseness is the price of its 
generality. Finally, in Theorem 2, we pull together the preceding results to prove the CLT. 


2. Preliminary Estimates 

Let e > 0. Let / : M —^ R be a bounded, continuous function. Let Z he a random variable on 
a probability space (17, P). We may assume that E(Z)=0 and var(Z)=l. Define the quantile 

of Z to be the function X : [0,1] —>■ R. defined by 

X(x) := inf{?/ G W\P{Z < y) > x}. 

Then, X is a random variable on [0,1] (equipped with Lebesgue measure, on Borel sets) having 
the same distribution as Z. 

For X G (0,1), let ei(x) be the ith bit in the binary expansion of x (for dyadic rationals, choose 
the expansion with the tail of O’s). We create the following matrix of binary digits: 


^ ei 

£3 

£6 

£2 

£5 

£9 

£4 

£8 

£13 


V 


/ 


For all X G (0,1), define Pi{x) to have binary expansion given by the ith column of the matrix. 
Let Xi{x) := X{Pi{x)). Then, {X^) is an i.i.d. sequence of random variables on [0,1] having the 
same distribution as X. 

Note that by assumption, X G L^([0,1]). The Haar basis is the simplest orthonormal system 
in L^([0,1]) and consists of the set S = {Hj^kix)\0 < j < oo, 0 < fc < 2^ — 1} U {x[o,i]} where 


24 




^ I Ol ? 97 / 


123 ’ 23 

-24 xG[^,4^) 

0 otherwise 


Since E{X) =0 and ||X|| < oo, it follows that 


OO 2^—1 

j=0 k=0 


where Cj^k = Jq ^{x)Hj^k{x) dx. Then, 


2^-1 


= E E = ^24c,- 

j—0 k—0 j—0 

where, as usual, [xj denotes the greatest integer < x. For n > 1, define 

n n oo 

Sr^ix) ■.= Y.X,{x) = 


and for M > 1, define 
( 1 ) 


j=0 


n M 


Sn,M{x) := ^ W.M(a;) = ^ ^ 2 = and 

2=1 2=1 j =0 


( 2 ) 


cfm '■= 


\ 


M 23-1 

E E 

j—Q k—0 


k' 
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Lemma 1. Let / : K —^ R &e a bounded, continuous function. Then, there exists a positive 
integer Mq such that for all M > Mq: 


ff(^) dXix) - f f dXix) 

Jo \ Vn J Jo \ (^My/n ) 


< e(6|l/||oo + !)■ 


Proof. Note that E = 0 and var^^ = 1- Let 


A := 

By Chebyshev’s inequality, 


\/n 


> L ^ and Bm '■= 


S, 


n,M 


o ' My/n 


> L 


A(A) < ^ < e and X[Bm) < ^ < e 


for L large. Since / is uniformly continuous on [—L,L\, then there exists a (5 > 0 such that 
x,y € [—L,L\ satisfying \x — y\ < 5 implies that |/(a;) — f{y)\ < £• Now, let 

Sn Sn,M 


Cm ■■= 


y/n OMy/n 
There exists an Mq G N such that for all M > Mq: 

Sr,. Sr, 


>5 


var 


f ^ \ <(i_crjvf^) + 2(l - aM)\/l- cr\j + {I - a mT < and so 

\y/n a-My/nJ V 


, . e6^ 

X{Cm) < = £■ 

Now, let S' n H ■ Then, A(S'^) < 3e. Hence, 




S, 


n,M 




< 2|l/||ooA(S=) + eA(S) < e(6|l/||oo + 1). 


□ 


Let 

1 “ 

XMix) := 

j=o 

for X € [0,1]. Below, we investigate the properties of this random variable. Note that Xm is 
a random variable which depends on (ei,..., cm+i)- From now on, we will let m := 2*^“*'^ for 
notational convenience. Thus, Xm is constant on dyadic intervals of length (= T). Let 

Oi, ..., 0 m denote the m values. It follows that 

m m 

Oi = 0 and of = m 

i—1 i—1 

as E{Xm) = 0 and var(AfM) = 1. 

We will now take a closer look at the random variable Sn,M of Equation (1). Each Xi m is a 
random variable with the m possible outcomes oi, ...,Om- Let Ki be the random variable which 
denotes the number of times the outcome Oi is observed among n independent trials, having 
outcomes ki. Then, 

(3) Sn,M{x) = Ki{x)oi + ... A Km{x)Om 

where Ki + ... + Km = n. 
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Note that Sn,M is a scalar product of an m-nomial random variable and the vector of outcomes. 
Since each outcome has probability — and the trials are independent, 


A ({x e (0,1) : Ki{x) = fci, ...,Km{x) = km}) 



and 


V\ JJ V '' 


Proposition 1. Let X be a random variable on [0,1] having mean 0 and variance 1, let 


Sr,{x) ■.= Y,XiP^{x)), 

i=l 


and for each M > 0, let Sn,M be as in Equation (1) and am be as in Equation (2). Then, for 
each bounded and continuous / : K —> R, and each e > 0, there exists Mq G N such that for all 


M > Mq, 




f Sn,M{x) \ 

V CTMV^ / 


d\{x) 


< e(6||/||oo + !)■ 


Proof. The theorem follows from Lemma 1 and the discussion following Lemma 1. 

□ 


The following lemma allows us to cut off the tails from the multinomial random variable. 
Consequently, we prepare the ground for the usage of Taylor’s formula. The tails of the multi¬ 
nomial random variable consist of all (fci,..., km) G {0,1,..., n}™ such that ki -I- ... -I- km = n and 
i [L^J - \ b ^\, \h^\] for all 1 < i < m - 1. 


Lemma 2. Let 

q{n,ki,...km) ■= {—\ ( ”, V 

\mj \ki,...,kmj 

Then, there exists a bo such that for all b > bo: 




n n 

X! ••• X! q{n,ki,...km) 

ki = l fcm=0 
ki+...+km=n 


L^l + L^'V"! [^\ + [bVni 

^ ... ^ q{n,ki,...,km) 

ki-\-...-\-km—'n 


< e 


Proof. Recall that is the random variable which denotes the number of times the outcome Oi 
is observed, having values ki. As each Ki is a binomial random variable, we have E^Kf) = ^ 
and var(A"i) = n(^)(l — ■^). By Chebyshev’s inequality, 


K,--\> b 
m 


( 1 --) 1 
V_ m > ^ 

~ b'^m ~ b'^m 


Then, there exists a bo such that for all b > bo'. 


A (I ATi — — I > b^yn for some 1 < i < to ) 
V TO / 


<^<- 


□ 


In the following lemma, we will use Stirling’s formula and Taylor series to approximate the 
probabilities for the multinomial distribution. For use here and in the proof of Theorem 2, we 
define some functions. For n > 0, we let: 
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For n > 0 and integers, ji, .. 


(5) := 


4n^ 


jm whose sum is 0, we let: 

) m / 9 \ 

-2 / ^ ^ \ Y^ -3 

i—\ ^ ' i—\ 


m? 

2>n? 


^jf + 0(n ^),and 

i=l 


( 6 ) 


p{n,ji, ...Jm) ■= dnC 


H{n,ji,...,jrn) 


Lemma 3. Let ji = ki — and suppose that — < ji < lb^/n\ for 1 < i < m — 1 and 

jm = —ji — — jm-i and n > Then, 


1 

to ” 





p{n,jl,:;jm)- 


Proof. Set 


l{n,ki,...,km) 


By Stirling’s Formula, we have 


—( " V 

to ” \ki, ..., kmj 


l{n, fci,..., km) 


(l + 0(l))(27r)in”+i 


Letting = ^ + j* for 1 < * < to. 


l{n,ki,...,km) = 


_(1 + C>(^)) (2^)^n"+^_ 

(27r)^TO”(^ + 

_ (l+0(^))TOf_ 

(27r)"^n"^(l + Iiih)(^+Ji + 5)...(1 + IM-)(^+jm+i)' 


For all 1 < i < TO, we set 

a(n, m,i) := (1 + ™+-^‘+ 2 ) '”(1+-^). 

n 

Using a Taylor series approximation, we have the following for n large enough (as ji is bounded 
by 0(Vn)): 

a{n,m,^) = e.p ((^ + j, + j] (^ ^ + 0{n-^) 


m 


n 


2n^ 3n^ 


„ , . , "rnff , mj, m^jf m^jf ^ m^jf ^ m^jf ^ 

Therefore, we have 


Kn-Jl, -Jm) =1 + 0 


p{nji,...jm), as required. 


□ 


Now we consider the Gaussian side. Let Yi, ...,+„ be i.i.d. standard normal random variables. 
Then, by the properties of i.i.d. normal random variables and by Lemma 2, 


. m 

1 = ^ 

'nm ^^ 


2 = 1 


is a standard normal random variable. 
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Lemma 4. Let S be the hyperplane in given by Pi = 0. Then, 


2=1 




m 


where C[Z) is a standard Gaussian m — 1 dimensional measure on the hyperplane S. Moreover, 

Proof. Set 

o = 


C \ 2=1 / 


Ol 


■ n ■ 


■ 1 ■ 


, ^ = 


, U = 


Om 


. Vn . 


1 


Then, the vector projection of Y in direction of the vector u is given by 




m 


Thus, is a one dimensional standard normal on the line through the origin and orthogonal 
to the hyperplane S. Viewing V and Z := Y — V as vector valued random variables, V and Z 
are independent as random variables in addition to being orthogonal as vectors. This can be 
verified by checking that all components of Z are independent of all components of V. Since all 

^ m 

components of V are equal to F := — > Vi, and since the components of y — y are Vi — V, 

m 

2 = 1 

then the independence of the Gaussian random variables follows as 

(r) 


Vl-l = o 

m m 


E{{Y, - Y)Y) = E{Y,Y) - E 

for all 1 = 1, m. Note that o^Y = (Y — V) + o^V. Since o^u = 0 

1 




o^Z 


m 


That is, / 


1 


Z ) does not depend on V. Note that since the components of Z satisfy 


(Zi — Y^ =0, C{Z), the law of Z, is a standard Gaussian to — 1 dimensional measure on the 


i=l 


hyperplane S. From the independence of Z and V, and Fubini’s Theorem, it follows that 


1 


TO 


1 


Y J — Ec{z)Ec(v)f ( Y J — Ec(z)f I Z J . 


TO 


1 


Since the density of Y is given by 


1 


(v^) 


exp I -h’y I = 


(v^) 


^_iexp(--z z 


(v^) 


exp (-^(y)') ’ 


where y, z, and y are the realizations of Y, Z, and Y, respectively, it follows that 


Ec(z)f 


1 


1 


o^Z\ = 


1 


E 

2 = 1 


OiPi exp 


-^Eyn 


2=1 















That is, the expected value with respect to C{Z) is a surface integral over the hyperplane S. In 

^ m 

arriving at the last equality we have used that y = — > yi = 0 on S. By projecting S onto the 

Vm = 0 plane, we have 

1 








2 = 1 


where 

i 

Lemma 5. Let 
I := 


□ 


(v^)' 


- J-J f (^^f2o^y)je( ^^’^=^y^Uyi...dy^-i. 


yi+y2 + ---+ym=0 

Then, there exist no S N and 5i > 0 such that for all n > no and for all b > bi, 


I- 


m 2 


/ /r\ \ ^ m — 

[yzTTj 


[&7nJ 

I E - 

ii=-[b7nj = 

jl+h + ---+3m=0 


[bVn] / m 


Oiji e 




.#) 


< e(||/||oo + !)• 


Proof. Let 

C ■.= {y G R™ : yi G [—5i/m, by/m] for all 1 < i < m — 1 and ym = —{yi + ■■■ + ym-i)} ■ 
We have 


(v^) 


i [ - [ exp dyi...dym-i=l. 


2=1 


(v^) 


yi+y 2 +...+ym =0 


Z=1 


Thus, there exists a bi such that for all b > bi, 

/ 


m 




m— 1 


e 2 5i;i=iyi dyi...dy^_^ I < e, and so: 


\ yeC” 


/- 


(v^) 


m—1 


< e 


Suppose that b > bi. Let 


h := 


{2'Kn) 


y&C ^ ' 

T f ■■■ f f /-kT.T=iV?) dyi...dym-i- 

y&C ^ ' 


Then, 


[&7nJ 

h = lim — 

(27rn)^ 


[f)7nj 


‘S' 


-¥Er=i^ 


.#) 


ii=-Lb\/nJ jm_i=-L6\/nJ 

Jl+t2 + ---+jm=0 

Hence, there exists an no G N such that for all n > no, 


I- 


m 2 


[by/n] 


[by/n] 

^ E 

(27rn) = j^=_iby/E\ j,n-i=-[by/tt\ 

jl+j2 + ..-+jm=0 




2=1 


< € 


1 ). 



















□ 


Proposition 2. Let Y be a standard normal random variable. Then, for each e > 0, for each 
hounded, continuous / : K —>■ R, there exists rig G N, 6i > 0 such that for all n > no and for all 
b > bi. 


m 

EifiY))-^^ E - 

(27rn) ^ jm-i=-[bYn\ 

il+j2 + ---+jm=0 






< e 


Proof. The statement is immediate from Lemma 4 and Lemma 5. 


1 ). 


□ 


3. Main Results 

Theorem 2 gives our proof of the CLT. The proof appeals to Theorem 1, which also establishes 
strong convergence off the tails and gives a rate and constant of convergence there; the rate is 
and the constant of convergence is These only hold off the tails as we have truncated 

the Haar expansions, the multinomial sum, and the Gaussian Riemann sum. Theorem 1 combines 
the results of Propositions 1 and 2. From now on, let b > max {fog, ^i}, where the former is as in 
Lemma 2 and the latter is as in the proof of Lemma 5. 


Theorem 1. Let 

D„ := E ••• E 

jl + ...+jm=0 

Then, 




Oty} ^ 


m 2 


- f Er=i^ 


(27m)' 




Proof. By Lemma 3, 


D„, = 


(27rn)' 


lbVn\ 

E 


lbVn\ 

E 


- ¥Er=i^ 




- 1 


fm-l=-L6%/ril tl=-L&V"J 

jl+---+jm=0 

_ ! nri^ \ V^m -2 , / rri^ \ -3 -4 


Where G(n, jd, J™) = (^) Eti J? + Eti jf " ^ TZi jf- Then, 


Dr, = 


(27m) 


ri [bVnJ 

^ E 


[bV"J 

E 




^G{n,ji,...,jm)+Oin L _ 1| < 


fm-l=-Lb%/riJ jl=-lbV^l 

jl+...+jm=0 


m 2 


(27m) 


[6V"J 

et E 


[bVnl 

E 


■T.T=i 


\G{n,ji,...,jm) +0{n ^)|. 


jm-l=-lbYn\ il=-Lt>V"J 

il+---+jm=0 


All of the terms which decay more quickly than are absorbed into the error term 0(h). It 
remains to compute the constant C(m). Define 

\bYn\ 


Ln .— 


m 2 


, 2 \ [bYn] 

9 


i: Cg'ii.p+fc 
















We let 


2 ^ [bV^i lb\/n] 


E„, := ^ ... ^ 


(27rn) 


(27rn) 2 \6n^ J 

i \ 6n2 / ^ 


jm-l = -[b^i jl=-[by^J \ i—1 / 

il+---+Jm = 0 


m / 2\ Lf>V"J L&V^J /m-1 

m 2 / m \ ^ Z,i=i it 

e V 


E 

jm-l = -[by/ni j-i=-[b^i 
h+...+Jm=0 

Approximating the sum by an integral, we have 

,2 ^ ^ 3/2 


E I-?'* 




£;(|A|3) + 0 (n 1 ) 

where A is a standard normal random variable. Thus, 

2 m{m — 1 ) 


Now, consider 


En < —7= — „ /- + 0{n 




”■ \&n‘^) {2TTn)^ ^ 


E 


_ / m -ym. 

^ 2 Z^i = l n J I 


By maximizing e 2 ® |a;p, 

/ 2 \ ^ 2/9 l^'WiJ L^V"! 

r. ^ I m^\ rn2 fn\^n ^ ^ 

" ~ \(yvi) (27rn)^^^ ^ ^ 

Approximating the sum by an integral, we have 


■^‘=1 " lg-3/233/2^ 


T„ < Vm 




Thus, 


Hence, we have 


F„<^e-3/233/2 + 0(n-i). 

oyn 


2771"^ 


a. + a. < ^ ■ + o(„-‘) < + o(,.->). 

V27r A^/n 6Vn 3v27rn 


It then follows that 


O 779 2 

3v 27rn 


□ 


Finally, we prove the CLT using Lemmas 1-5 and Theorem 1. Recall from Lemma 2 that: 


g(n, fci, ...fcm) := 

Recall from Lemma 3 that: 

dn := Vm 


n 

^ 1 ; ■•■5 

/ m \ — 
V 2n7r/ 


Vn 


For n > 0 and integers, ji, ..., jm whose sum is 0: 


TT, ■ ■ \ m‘" m\ .2 m \ .0 m'^ .4 _ix 

i?(n,ji,...,j^):=(^--) 5 ]j, + 0 (n ),and 


6 n^ 6 n^ 


m 

3n^ 


ji, ■•■, jm) := 


10 














Theorem 2. Let (Xi) be a sequence of i.i.d. random variables with mean /i and variance a^. 
Let / : K —^ R &e a bounded, continuous function. Then, for each e > 0, there exists ni G N such 


that 


E[f 


Xi + ... + Xn — n^ 
a^/n 


-E{f[Y)) 


< e(9|l/||oo + 2) 


for all n > ni, where Y is a standard normal random variable. 


Proof. By Lemma 1, we reduce the problem to dealing with the projection of a multinomial 
random variable and we have 


A„ := 


Eif 




-EifiX)) 


< 


Eif 




(XMy/n 

Y, - E Q{n,ki,-km)-E{f{Y)) 


n n 


ki=0 km—0 


-Eif{Y)) 


s(6||/||oo + !)• 


1 ) = 


By Lemma 2, we cut off the tails of the multinomial random variable to obtain 

A,< i: ... g(n, fci, ...fc^) - L;(/(r))| + e(7||/|U + 1). 

ki^-’-Xk-m — n 

By Lemma 3, we further simplify the multinomial sum to obtain 


An < 


[bV"J 


E p{n,n, Jn.)/ ~ 


+ e(7||/||oo + !)• 


[bVn] 

E 

jm-l=-LbVnJ ji=-[bV»J 
jl + ...+jm=0 

Writing T as a sum of m independent standard normal random variables, it follows that 
Eiti ‘’‘^4 = Af(0,1). By Lemma 4, 








= E 


C{Z) 


/ E 


OiYi 


yi = l 


By Lemma 5, we approximate the Gaussian integral by a Riemann sum. This approximation 
allows us to match the multinomial side and the Gaussian side and apply Theorem 1: 


Xji < d^i 


\bV^\ 


\by/n\ 

E 

Jm-l=-LbV»J h=-[bV^] 

jl+---+jm=0 


'‘'j / \-^m 

f f Xi=iJ 

2^ Y 


e(8||/||oo + 2) 


2 m^ 


= 1 j I _ g 


I m -Jj 

1 2 2^i=l n 


+e(8||/||oo+2) < 


3-\/27rn 


+ 0{n ^) < e 


■ 2 ) 


for all n > ni. 


□ 
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